AnCora-UPF: A Multi-Level Annotation of Spanish

نویسندگان

  • Simon Mille
  • Alicia Burga
  • Leo Wanner
چکیده

There is an increasing need for the annotation of multiple types of linguistic information that are rather different in their nature, e.g., word order, morphological features, syntactic and semantic relations, etc. Quite frequently, their annotation is combined in a single structure, which not only results in inadequate annotations of treebanks and consequent low-quality applications trained on them, but also is deficient from a theoretical (linguistic) perspective. We present a new corpus of Spanish annotated on four independent levels, morphology, surface-syntax, deep-syntax and semantics, as well as the methodology that allows for obtaining it with fewer cost while maintaining a high inter-annotator agreement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Annotation of Deverbal Nominalizations in the Spanish AnCora corpus

This paper presents the methodology and the linguistic criteria followed to enrich the AnCora-Es corpus with the semantic annotation of deverbal nominalizations. The first step was to run two independent automated processes: one for the annotation of denotation types and another one for the annotation of argument structure. Secondly, we manually checked both types of information and measured in...

متن کامل

Semantic Annotation of Deverbal Nominalizations in the Spanish corpus AnCora_reference_aligned_to_the_left

This paper presents the methodology and the linguistic criteria followed to enrich the AnCora-Es corpus with the semantic annotation of deverbal nominalizations. The first step was to run two independent automated processes: one for the annotation of denotation types and another one for the annotation of argument structure. Secondly, we manually checked both types of information and measured in...

متن کامل

AnCora-Verb: A Lexical Resource for the Semantic Annotation of Corpora

In this paper we present two large-scale verbal lexicons, AnCora-Verb-Ca for Catalan and AnCora-Verb-Es for Spanish, which are the basis for the semantic annotation with arguments and thematic roles of AnCora corpora. In AnCora-Verb lexicons, the mapping between syntactic functions, arguments and thematic roles of each verbal predicate it is established taking into account the verbal semantic c...

متن کامل

AnCora: Multilevel Annotated Corpora for Catalan and Spanish

This paper presents AnCora, a multilingual corpus annotated at different linguistic levels consisting of 500,000 words in Catalan (AnCora-Ca) and in Spanish (AnCora-Es). At present AnCora is the largest multilayer annotated corpus of these languages freely available from http://clic.ub.edu/ancora. The two corpora consist mainly of newspaper texts annotated at different levels of linguistic desc...

متن کامل

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013